Skip to main content
Version: 2.32.00

Plugin Configuration Basics

Storage Plugin Attributes

The following graphic shows key attributes of a typical dfs-based storage plugin configuration:

![alt text](image.png)

Using the Formats Attributes

You set the formats attributes, such as skipFirstLine, in the formats area of the storage plugin configuration. When setting attributes for text files, such as CSV, you also need to set the sys.options property exec.storage.enable_new_text_reader to true (the default). For more information and examples of using formats for text files.

Table Function Parameters

In Datagator X, you can also set the formats attributes defined above on a per query basis. To pass parameters to the format plugin, use the table function syntax:

select a, b from table({table function name}(parameters))

The table function name is the table name, the type parameter is the format name, and the other parameters are the fields that the format plugin configuration accepts, as defined in the table above (except for extensions which do not apply in this context).

For example, to read a CSV file and parse the header: select a, b from table(dfs.path/to/data.csv(type => 'text', fieldDelimiter => ',', extractHeader => true))

Specifying the Schema as Table Function Parameter

Table schemas normally reside in the root folder of each table. You can also specify a schema for an individual query using a table function and specifying the SCHEMA property. You can combine the schema with format plugin properties. The syntax is similar to the CREATE OR REPLACE SCHEMA:

SELECT a, b FROM TABLE (table_name(
SCHEMA => 'inline=(column_name data_type [nullability] [format] [default] [properties {prop='val', ...})]'))

You can specify the schema inline within the query. For example:

select * from table(dfs.tmp.`text_table`(
schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`})
properties {`drill.strict` = `false`}'))

Alternatively, you can also specify the path to a schema file. For example:

select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))

The following example demonstrates applying provided schema alongside with format plugin table function parameters. Suppose that you have a CSV file with headers and with a custom extension: csvh-test. You can combine the schema with format plugin properties:

select * from table(dfs.tmp.`cars.csvh-test`(type => 'text',
fieldDelimiter => ',', extractHeader => true,
schema => 'inline=(col1 date)'))

Using Other Attributes

The configuration of other attributes, such as size.calculator.enabled in the hbase plugin and configProps in the hive plugin, are implementation-dependent and beyond the scope of this document.

Case Sensitivity

In Datagator X, storage plugin names and workspaces (schemas) are case-insensitive. For example, the following query uses a storage plugin named dfs and a workspace named clicks. You can reference dfs.clicks in an SQL statement in uppercase or lowercase, as shown:

USE dfs.clicks;
USE DFS.CLICKs;
USE dfs.CLICKS;